Add Amazon Polly and ElevenLabs TTS support#108
Open
xermitik wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request: Amazon Polly & ElevenLabs TTS Support
Overview
This PR adds support for two cloud TTS providers — Amazon Polly and ElevenLabs — as online voice sources in NaturalVoiceSAPIAdapter. Both providers are fully integrated into the existing SAPI voice enumeration and synthesis pipeline and are configurable through the installer UI.
Additionally, several bug fixes are included (see below).
Related issues:
New Features
Amazon Polly
Files added:
NaturalVoiceSAPIAdapter/AmazonPollyAPI.h/.cpp— HTTP client for the Polly REST API:GET /v1/voices, iteratesNextToken)<prosody>tags before sendingInstaller/PollyKeyDlg.cpp— installer dialog for entering AWS Access Key ID, Secret Key, region, and engine typeFiles modified:
NaturalVoiceSAPIAdapter/TTSEngine.h/.cpp—InitPollyVoice,SetupPollyEvents; dispatch inSpeakAsync/StopNaturalVoiceSAPIAdapter/VoiceTokenEnumerator.cpp—EnumPollyVoices: fetches and filters voices by language, creates SAPI tokens (Polly;Cloud, registry keyPolly-{VoiceId}, credentials stored inNaturalVoiceConfig)NaturalVoiceSAPIAdapter/NaturalVoiceSAPIAdapter.vcxproj+.filters— added new source filesInstaller/Installer.rc— "Enable Amazon Polly online voices" checkbox, "Set Polly keys…" button,IDD_POLLYKEYdialog (bilingual CN/EN)Installer/resource.h— new resource IDs:IDD_POLLYKEY,IDC_CHK_POLLY_VOICES,IDC_SET_POLLY_KEY,IDC_POLLY_ACCESS_KEY,IDC_POLLY_SECRET_KEY,IDC_POLLY_REGION,IDC_POLLY_ENGINEInstaller/MainDlg.cpp—UpdateEnableStates/UpdateDisplay/SaveChangesfor Polly; handler forIDC_SET_POLLY_KEYInstaller/Installer.vcxproj+.filters— addedPollyKeyDlg.cppRegistry keys (
HKCU\Software\NaturalVoiceSAPIAdapter\Enumerator):NoPollyVoices1— disable Polly voice enumerationPollyAccessKeyPollySecretKeyPollyRegionus-east-1PollyEngineneural/standard/long-form/generativeElevenLabs
Files added:
NaturalVoiceSAPIAdapter/ElevenLabsAPI.h/.cpp— HTTP client for the ElevenLabs REST API:xi-api-keyHTTP headeroutput_format=pcm_24000) — no decoder requiredSsmlToPlainText){"detail": {"message": "..."}}and{"detail": "..."}GET /v2/voices?page_size=100, loops vianext_page_tokenuntilhas_more=false)verified_languages[0].locale→labels["language"](ISO 639-1 → BCP-47 table, ~30 languages) →en-USInstaller/ElevenLabsKeyDlg.cpp— installer dialog for entering API key and selecting model; includes a link tohttps://elevenlabs.io/app/settings/api-keysFiles modified:
NaturalVoiceSAPIAdapter/TTSEngine.h/.cpp—InitElevenLabsVoice,SetupElevenLabsEvents; dispatch inSpeakAsync/StopNaturalVoiceSAPIAdapter/VoiceTokenEnumerator.cpp—EnumElevenLabsVoices: fetches voices, creates SAPI tokens (ElevenLabs;Cloud, registry keyElevenLabs-{VoiceId}, credentials stored inNaturalVoiceConfig)NaturalVoiceSAPIAdapter/NaturalVoiceSAPIAdapter.vcxproj+.filters— added new source filesNaturalVoiceSAPIAdapter/pch.h— added#include <cwctype>(required forstd::towupperin locale detection)Installer/Installer.rc— "Enable ElevenLabs online voices" checkbox, "Set ElevenLabs key…" button,IDD_ELEVENKEYdialog;IDD_MAINheight extended to accommodate new controls (bilingual CN/EN)Installer/resource.h— new resource IDs:IDD_ELEVENKEY,IDC_CHK_ELEVENLABS_VOICES,IDC_SET_ELEVENLABS_KEY,IDC_ELEVENLABS_LINK,IDC_ELEVENLABS_API_KEY,IDC_ELEVENLABS_MODELInstaller/MainDlg.cpp—UpdateEnableStates/UpdateDisplay/SaveChangesfor ElevenLabs; handler forIDC_SET_ELEVENLABS_KEYInstaller/Installer.vcxproj+.filters— addedElevenLabsKeyDlg.cppRegistry keys (
HKCU\Software\NaturalVoiceSAPIAdapter\Enumerator):NoElevenLabsVoices1— disable ElevenLabs voice enumerationElevenLabsApiKeyxi-api-keyvalueElevenLabsModeleleven_multilingual_v2Diagnostics
Debuglogging reports the selected provider voice/model/engine and received audio byte counts.Tracelogging can include provider request bodies and truncated API error/list responses for troubleshooting.xi-api-keyheader.Bug Fixes
NaturalVoiceSAPIAdapter/TTSEngine.cpp— fixed invalid SSML sent to Azure when the caller (e.g. .NETSystem.Speech) wraps its input in a<speak>root element: SAPI forwards such tags viaSPVA_ParseUnknownTagwhen it does not recognise the namespace/version attributes, which previously caused a nested<speak>to appear in the SSML payload. The fix detects and skips any<speak>tag in the unknown-tag path before appending it to the SSML being built.NaturalVoiceSAPIAdapter/Mp3Decoder.cpp— set minimum ACM stream buffer to 16 384 bytes; fixes a crash when Polly returns a short MP3 clipNaturalVoiceSAPIAdapter/TaskScheduler.h— thread-safe one-time initialization viastd::call_once; task tracking byTaskHandleinstead of raw pointerNaturalVoiceSAPIAdapter/WSConnectionPool.cpp— movedconnectionChanged.notify_all()beforeRemoveConnection()in close/error handlers; eliminates a race condition on connection teardownInstaller/MainDlg.cpp— Azure and Polly checkboxes now require all mandatory credentials before showing those providers as enabled